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Abstract 

The nascent field of compressed sensing is founded on the fact that high-dimensional signals with "simple 
structure" can be recovered accurately from just a small number of randomized samples. Several specific kinds of 
structures have been explored in the literature, from sparsity and group sparsity to low-rankedness. However, two 
fundamental questions have been left unanswered, namely: What are the general abstract meanings of "structure" and 
"simplicity"? And do there exist universal algorithms for recovering such simple structured objects from fewer samples 
than their ambient dimension? In this paper, we address these two questions. Using algorithmic information theory 
tools such as the Kolmogorov complexity, we provide a unified definition of structure and simplicity. Leveraging 
this new definition, we develop and analyze an abstract algorithm for signal recovery motivated by Occam's Razor. 
Minimum complexity pursuit (MCP) requires just 0(«;logn) randomized samples to recover a signal of complexity 
k and ambient dimension n. We also discuss the performance of MCP in the presence of measurement noise and 
with approximately simple signals. 

I. Introduction 

Compressed sensing (CS) refers to a body of techniques that undersample high-dimensional signals, and yet 
recover them accurately by exploiting their intrinsic "structure" or "compressibility" |T], (2). This leads to more 
efficient sensing systems that have proved to be valuable in many applications, including cameras |3], magnetic 
resonance imaging (MRI) Q and radar (5j-|(7), to name a few. While the promise of compressed sensing has 
been to undersample "structured" signals, its premise is still limited to specific instances of "structure" such as 
sparsity and low-rankedness. While these notions are important in their own right, the notions of "structure" and 
"compressiblity" are of course more general than these specific instances. The goal of this paper is to provide a 
general notion of structure and exploit it to recover signals from an undersampled set of linear measurements. 

Towards this end, we use Kolmogorov complexity, which is a measure of complexity for finite-alphabet sequences 
introduced by Solomonoff |8j and Kolmogorov |9j. We define the Kolmogorov information dimension of a real- 
valued signal as the growth rate of the complexity of its quantized sequences as the quantization becomes finer. 
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We show that if the Kolmogorov information dimension of a signal is much smaller than its ambient dimension, 
then it can be recovered from fewer measurements than its ambient dimension. Based on Occam's razor flO) , 
we propose the minimum complexity pursuit (MCP) recovery algorithm. MCP finds the simplest object (in the 
Kolmogorov complexity sense) that satisfies the measurement constraints. Roughly speaking, we prove that MCP 
is able to recover a signal with "complexity" k using no more than O(ztlogn) measurements. Finally, we establish 
the robustness of MCP to noise on both the measurements and the signal. 

Here is the structure of the paper. Section [TT] describes the notation used in the paper and introduces the 



Kolmogorov information dimension. Section III summarizes our main contributions and their implications. Section 



IV bounds the Kolmogorov information dimension of several popular classes of signals in CS. Section [V] makes a 



comparison of our work with the related papers in the literature. Section [VT] provides the proofs of our main results. 



Finally, Section VII concludes the paper. 



II. Definitions and problem statement 

A. Notation 

Calligraphic letters such as A and B denote sets. For a set A, \A\ and A c denote its size and its complement, 
respectively. For a sample space 51 and an event set A C 57, 1_4 denotes the indicator function of the event A. 
Boldfaced letters denote vectors. For a vector x £ R™, Xi, \\x\\ p = E"=i l^il 23 ) 1 ^' an ^ ll x lloo — maxi |xj| denote 
the i th component, l v norm and norm of x, respectively. For 1 < % < j < n, x\ = (x^, Xi+i, . . . , Xj). Also, to 
simplify the notation, x 3 denotes x\ . Uppercase letters are used for both matrices and random variables, and hence 
their usage will be clear from the context. For integer n, I n denotes the n x n identity matrix. 

Let {0, 1}* denote the set of all finite-length binary sequences, i.e., {0, 1}* = U„>i{0, l} n . Similarly, {0, 1}°° 
denotes the set of infinite-length binary sequences. 

For a real number x £ [0, 1], let [x] m denote its m-bit approximation that results from taking the first m bits in 
the binary expansion of x. In other words, if x = 2~*(a;)i, where (x)i £ {0, 1}, then 

m 

[x] m ±Y, 2 ~ 1 ^- 



Similarly, for a vector x £ [0, 1]™, define 



[■^■]m ([^l]mi ■ ■ ■ i [^n]m)- 



B. Kolmogorov complexity 

The Kolmogorov complexity of a finite-length finite alphabet sequence x with respect to a universal computer 
U is defined as the minimum length over all programs that print x and haltj^J 

1 See Appendix |a| or Chapters 2 and 3 of [11] for a general introduction to Kolmogorov compl 



exity. 

2 For some technical reasons in our definitions we are considering prefix Turing machines. See Appendix |Aj or Chapters 2 and 3 of |l l) for 
the definition. 
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For x 6 {0,1}*, let -ft^(x) denote the Kolmogorov complexity of sequence x with respect to the universal 
computer U. For a universal computer U and any computer A, there exists a constant C4 such that Ku(x) < 
Ka(x) + ca, for all strings x e {0, 1}* |TT], fl2) . This result is known as the invariance theorem in the field 
of algorithmic complexity. Note that the constant is independent of the length of the sequence, n, and hence 
can be neglected for sufficiently long x. As suggested in fl2) , we drop the subscript U, and let K (x) denote the 
Kolmogorov complexity of the binary string x . For two finite alphabet sequences x and y, K(x | y) is defined as the 
length of the shortest program that prints x and halts, given that the universal computer U has access to the sequence 
yj^J Similarly, the Kolmogorov complexity of an integer n € IN, K(n), is defined as the Kolmogorov complexity of 
its binary representation. The following theorem summarizes some of the properties of the Kolmogorov complexity 
that will be used throughout the paper. Define 

log* n = |~log 2 n\ + 2 log 2 max( |~log 2 n\ , 1). 

Theorem 1: Let x,y be binary strings of lengths £(x) and i(y), respectively. Furthermore, let m,n g IN. The 
Kolmogorov complexity satisfies the following properties: 

i. K{x\£(x)) < £(x) + c, 

ii. K(x,y) < K(x) +K(y) + c, 

iii. K(x I y) < K(x), 

iv. K(x) < K(x I £(x)) + K(£(x)) + c, 

v. K(n) < log* n + c, 

vi. K(n + to) < K(n) + K(m) + c, 

where c is a constant independent of x, y, n and to, but might be different from one appearance to another. 

The proofs of different parts of this theorem can be found in fTT) , fTZ) . However, for the sake of completeness, 

we overview the proofs in Appendix |B| 

Kolmogorov complexity provides a universal measure for compressibility of sequences. If an infinite length binary 
sequence x satisfies 

K(x 1 ,X2,...,X n ) 

lim = 1, 

n— >oo 71 

then it satisfies all computable statistical tests for randomness ( |12| Theorem 14.5.2). For instance, the number of 
zeros is "close to" the number of ones in this sequence. Furthermore, if the Kolmogorov complexity of x is smaller 
than the ambient dimension, then it means that we can compress x (represent it with fewer bits); the encoder returns 
the shortest program that has generated x and the decoder is the universal Turing machine that generates x from 
this short program. 

3 Note that _R"(x |y) is often defined as AT(x |y, p y ) where p y is the shortest program that generates y. This formulation provides symmetry 
in the definition of algorithmic mutual information. But we will not use this definition in this paper. 
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C. Problem statement 

1 ) Compressed sensing versus compression: Algorithmic information theory is mainly concerned with the com- 
pression of binary sequences (or finite alphabet sequences). The objective of a compression algorithm is to represent 
these sequences using as few bits as possible. However, in this paper we are interested in the problem of CS, where 
the goal is to reconstruct a signal x Q € R™ from its lower dimensional linear projections y Q = Ax a , where 
A G H dxn with d < n. Note two distinguishing features of this problem. First, since the system of equations is 
underdetermined, perfect reconstruction is not always possible. Therefore some knowledge of the structure of x Q is 
required for recovering it from the measurements y Q . Second, the problem is different from the traditional problem 
of algorithmic information theory that considers the compression in terms of bits. Hence, this problem requires a 
new perspective on the Kolmogorov complexity of real- valued signals. 

2) Kolmogorov information dimension: Let the components of x D £ [0, 1]™ have terminating binary expansions. 
We define the Kolmogorov complexity of x Q as the length of the shortest program that prints the binary expansion 
of x Q and halts. By definition, the Kolmogorov complexity of sequences with nonterminating binary expansion is 
infinite. Following the ideas in algorithmic information theory, one can consider the "structure" of x D to be the 
shortest program that generates it JT3J. The shorter the program, the more structured the signal. But this assumption 
is very restrictive, since most real-valued signals have infinite Kolmogorov complexity. The first step to remedy this 
issue is to calculate the Kolmogorov complexity of a "quantized" version of x„. For x = (2:1,2:2, . . . , x n ) € [0, 1]™, 
define the Kolmogorov complexity of x at resolution m as 

#H«( X )A inf {K(u I n, m) I Hx-uHoo < 2~ m } . (1) 
ue[o,i]» 

We can provide an upper bound for K^ m (x) by considering certain instances of u. For example, 

(x) <K([x] m I m,n). 

Note that K^ m (x) is defined as the Kolmogorov complexity of the "quantized" version of x conditioned on m 
and n, because it is natural to assume that the encoder and decoder have access to both the ambient dimension n and 
the quantization level m. For most real valued signals this quantity goes to infinity as m approaches infinity. But, 
the growth rate is proportional to m. Therefore, we consider a normalized version of the Kolmogorov complexity. 

Definition 1: The Kolmogorov information dimension of {x\, x%, . . . , x n ) € [0, 1]™ at resolution m is defined as 

A K^™(xi,x 2 , • ■ ■ ,x n ) 
m 

In general the number of quantization levels to may depend on the ambient dimension n. The division of K^ m (x) 
by the resolution level m ensures that for a fixed value of n this quantity is always finite. 
Lemma 1: Let x € [0, 1] " . Then we have 

t \ c 

TO 
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where c is a positive constant independent of m, n, and x . In particular, 



limsup K m , n (x) < n. 
m— >oo 



Proof: We first note that 



X Hm (x) = ^inf n {K{u | n, m) | Hx-uHoo < 2~ m } 
< K([x] m ). 

Now, we derive an upper bound on K ([x] m ) by providing a program that describes [x] m conditioned on knowing m 
and n. Consider the program that first explains the structure of the sequence as consisting of n m-bit subsequences 
and then identifies the bits. Since the computer has access to to and n, a constant number of bits (independent of 
to or n) is sufficient for specifying the structure, and it then requires ran more bits to specify each component 
[xi] m . Therefore, overall 

K([xi] mi [x 2 ] m , [xn\m I m, n) nm + c 

K m,n( x ) S S 

ra ra 

The second part of theorem is a straightforward result of the first part. □ 
Remark 1: Note that the existence of a finite upper bound on K^ m (x) ensures that the infimum in ([TJ is achieved. 
This is due to the fact that the number of sequences (ui,ii2, ■ ■ ■ , u n ) that have K(u±, u%, . . . , u n ) < ran + c is 
finite. In the rest of the paper we denote the minimizing vector by <f> m (x), i.e., 

4> m (x) = arg min {K(u \ n, m) \ ||x - ulloo < 2 _m } . 
ue[o,l]« 

The following examples clarify some of the properties of the Kolmogorov information dimension. 
Example 1: (Sparse signals) Consider a fc-sparse signal x £ [0, 1]™. That is, x has at most k nonzero coefficients. 
For any given 5 > 0, the Kolmogorov information dimension of x at resolution to, for large enough values of to, 



is upper bounded by k + S. See Section IV-A for the proof of this claim. 

Example 2: (Low-rank matrices) Let X denote aMxJV real-valued matrix such that o- max {X) < lj^For any 
given 5 > 0, the Kolmogorov information dimension of X at resolution ra is upper bounded by r(M + N + 1) + S, 



for sufficiently large values of ra. See Section IV-E for the proof of this claim. 

Let U[a, b] denote the uniform distribution between a and b. Also, let X ~ Bern(p) represent a Bernoulli random 
variable with P(X = 1) = 1 — P(X = 0) = p. The following proposition lets us construct the third example that 
represents an unstructured signal. 

Proposition 1: Let {Xi}^ ~ U[0, 1]. Then, for any n > 1, 

lim —K^(X 1 ,X 2 ,...X n ) = 1 

m->oo ran 

4 As long as all the singular values are upper bounded by a constant the statement of this example holds. For the notational simplicity we 
choose 1 as the upper bound for the singular values. 



August 30, 2012 



DRAFT 



6 



in probability. 

Proof: For i G {1, 2, . . .}, let X t = Ef=i( X i)j 2 ~ j > where ( X i)i e i ' ^ Then {( X i)i>i^i Bern(l/2) (h). 
Let f/ n = <£ m (X"). Since \U l - X t \ < 2~ m , then, for j < m - 1, (Xi)j = (Ui) r Therefore, 



K(U n | m, n) > X({(([/,) lr .,(t/,) m )}^ |m,n)-c 
m to 

_ K({((X i ) 1 ,...,(X i ) m )}f =1 \m,n)-c 



(2) 



Theorem 14.5.3 in |l 2) states that the normalized Kolmogorov's complexity of a sequence of i.i.d. Bern(l/2) bits 
converges to 1 in probability. In other words, 

lim K({(X i ) 1 ,(X i ) 2 ,...,(X i ) m }? =1 \m,n) = (3) 

m— >oo mn 

in probability. Therefore, combining Lemma [T] and ([3| yields the desired result. □ 
Example 3: If the random variables {Xi}™ =1 % t/[0, 1], then 

.. iri-]»(Xi,i 2 ,.-Jn) 

lim = n 



m— ¥oo 



in probability. The proof follows directly from Proposition [T] 

These examples demonstrate that, at least in cases where the ambient dimension is fixed and the quantization levels 
grow without bound, the Kolmogorov information dimension is much smaller than the ambient dimension for the 
two well-known structured signals in Examples [T] and |2j and is equal to the ambient dimension for the unstructured 
signal in Example [3] We present more examples of structured signals and the corresponding upper bounds on their 



Kolmogorov information dimension in Section IV 



3) CS of real-valued signals: Consider the problem of recovering a structured real-valued signal x D = (x 0t i, x 0t 2, ■ ■ 
with K m ^ n (x") = 0(n 1_a ), for some a > and proper choice of to, from an underdetermined set of linear 
equations y Q = Ax , where y G € R d and d < n. We follow Occam's Razor principle and among all the solutions 
of y D = Ax Q , seek the solution that has the minimum complexity, i.e., 

argmin K^ m (x.) 

X 

s.t. Ax = y . (4) 

We call this algorithm minimum complexity pursuit or MCP. MCP has a free parameter m whose effect on the 
performance of the algorithm will be discussed in detail later. We will show that MCP can recover x Q from fewer 
measurements than the ambient dimension of the signal. This result extends the scope of CS from the class of sparse 
signals or the class of low-rank matrices to the class of all signal with small Kolmogorov information dimension. 

In this paper we ignore the practical issues of approximating the MCP algorithm. In an independent work, fT5) , 
[fT6| have considered a practical version of this algorithm and provided promising results in that direction. Further 
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investigation of the practical issues is left for the future research. 

III. Our contributions 

A. Recovery in the noiseless setting 

Suppose that A g R dx ™, where A^ are i.i.d. A^(0, 1), and assume that y Q = Ax Q . Let x G = x (y , A) denote 
the output of Q to the inputs y and A. For an infinite sequence x Q = (x 0) i, x 0) 2, ■••)> x o denotes the first n 
elements of x D , i.e., x£ = (x 0j i, x 0;2 , ■ ■ ■ , x ,n)- 

Theorem 2: Let x D = (x ,i, x ,2, ■ ■ •) G [0, 1]°°. For integers m and n, let K m ,n{x™) denote the information 
dimension of x" at resolution m. Then, for any r„ < 1 and i > we have 

P(lK-£oll2 > (r- W n / d + 1 + ^ + l)%/n2- 2m + 2 ) 

< 2 2 Km ,„m e f (l-r^+21ogr„) +e -ft 2 _ 



The proof is presented in Section VI-B We consider several interesting corollaries of this theorem for high 
dimensional problems. 

Corollary 1: Assume that x D = (x ,i,x 0j 2, ■ ■ ■) € [0, 1]°° and m = to,! = |~logn~|. Let n n = K mnt7l . Then if 
d n = \ K n log n\ , for any e > 0, we have 

p(K-^|| 2 > e )^o, 

as ?i — > oo. 

Proof: For m = m n = [log n] and d„ = [~k„ log n] , we have 

(r-Wnd- 1 + t+l + l)Vn2- 2 ™"+ 2 < 2 (^V fan log n]- 1 + (* + + • (5) 

Hence, fixing i > and setting r n = r = 0.04, for any e > 0, if n is large enough, then 



(r-V^ -1 +t + 1 + l)Vn2- 2m + 2 < e. 

Therefore, for n large enough, 

p(K-«>c) 

< 2 2 "" L °s » e ~ '^° s " (!- t2 + 2 '°g T ) + e ~ ^ . 

< gl.4K„ logn e -2.7K„ logn + e -f t 2 ^ ^ 

which shows that as n — > oo, P(||x™ — x"|| 2 > e) — > 0, as n — > oo. □ 
According to Corollary [T] if the complexity of the signal is less than k, then the number of linear measurements 
required for its asymptotically perfect recovery is roughly speaking on the order of n\ogn. In other words, the 
number of measurements is proportional to the complexity of the signal and only logarithmically proportional to 
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its ambient dimension. 

Corollary 2: Assume that x Q = (x 0t i, x 0t 2, ■ ■ ■) G [0, 1]°° and m n — [logn]. Let K mtn = K n oo as n oo. 
Then, if d n = 3n n , for e > we have 

p(-L||ar»-^|| 2 >e) ^0 

as n — > oo. 

Proof: Setting r n = n~i , m = m n = [log n\ , and d = d n = [3k„] in Theorem [2] it follows that 

P (-^\Wo ~ XoWi > + (*+ I)"" 1 + 

< 22K„logn e |K„(l-« _1 -logn) ^_ e -f* 2 
= e -(|-21og2)it„ logn+K„(f-f™- 1 ) +e -ft^ 

Since | — 2 log 2 > 0, for any e > and n large enough, we have 



2\]dn +(t + ljn- 1 + 2V?!- 1 < e. 

It follows that P(^||jc" - 1E0II2 > e) ->• 0, as n ->• 00. □ 
In other words, if we are interested in the normalized mean square error, — £"11! rat h er tnan tne ^2 -norm 

error — x™ H2, then 3k„ measurements are sufficient for asymptoticly accurate recovery. 

It is worth noting that, while m n is set to log n in Corollaries [T] and |5J it can be considered as a free parameter 

of the MCP algorithm. Theorem [2] describes the trade-off of the parameters. If we fix all the other parameters in 

Theorem |2j then increasing m is equivalent to decreasing the reconstruction mean square error. But also it decreases 

the probability of correct recovery. 

B. Recovery in the presence of Gaussian noise in measurements 

In the previous section, we considered the case of recovering low-complexity signals from their noise-free linear 
measurements. In this section, we extend those results to the case of noisy measurements, where y a = Ax Q + w, 
with w ~ W(0, a 2 Id). Assuming that the complexity of the signal is known at the reconstruction stage, we consider 
the following reconstruction algorithm: 

argmin ||Ax-y || 2 , 

X 

s.t. id-]-"" (x) < K n m n . (7) 

Note that K n m n is an upper bound on the Kolmogorov complexity of x Q at resolution m n . We call this algorithm 
low-complexity least squares (LLS). Our quest in this section is to find the number of measurements required to 
make the LLS algorithm specified by |7]i robust to noise. 

Theorem 3: Assume that x Q = (x 0> i, x 0} 2, ■ ■ ■) G [0, 1] 00 . For integers m and n, let K mtTl denote the information 
dimension of x Q at resolution m. If m = m n = [logn], d = 8rK m , n m, and p = (1 — V r _1 ) 2 /2, where r > 1, 
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then 



& n o\\l > {2Km ^ )a ) () ,8) 



as n 



The proof is presented in Section VI-C Note that, since the elements of the matrix A are i.i.d. 7V(0, 1), as the 
ambient dimension n grows, so does the signal-to-noise ratio per measurement. In order to have fixed signal-to-noise 
ratio per measurement, one can draw the elements of A i.i.d. from 7V(0, 1/n). In this case, it is not difficult to see 
that the normalized mean square error — X™ \\%/n < £grsigp)g m probability. 

C. Recovery in the presence of deterministic noise 

Consider again the measurement system we introduced in the last section: y Q = Ax. a + w, where w represents 
measurement noise. Unlike the previous section, assume that the noise is deterministic and has bounded ^2- norm > 
i.e., || w|j 2 < e. This type of noise provides a good model for quantization noise on the measurements, among other 
practical nonidealities. Note that unlike the case of stochastic noise, deterministic noise can be adversarial. We 
prove that the LLS algorithm considered in (|7]) provides a sufficiently accurate estimate of x Q even in the presence 
of such noise. 

Theorem 4: Let x G = (x 0t %, . . . , x 0yTl ) £ [0, 1]™. Let re TO)Ti denote the information dimension of x G at resolution 
to. Then, for any r„ < 1 and t > 0, 



c D - x || 2 > ^-J. Vn2- 2 ™+ 2 



T n Vd / 



i t 1 



Since the proof of this theorem is very similar to the proof of Theorem [2] it is not included in the paper. Here the 
probability of accurate recovery is the same as in Theorem [2] and under similar conditions this probability converges 
to one. The reconstruction error has two terms. The first term, y//n ^ d ^ 1+ ^-^/ n 2^ 2m+2 , is again similar to Theorem 
|2jand under similar conditions converges to zero. The second term in the error, — is due to the noise in the 
measurements. As the number of measurements increases, ^ ^= converges to zero. This is due to the fact that since 
Ai.j ~ A/"(0, 1) as we increase the number of measurements, the energy of the signal per measurement is fixed. 
But since the total amount of energy of the noise is considered to be constant the average noise per measurement 
decreases by l/Vd. 

D. Recovery of approximately low-complexity signals 

In the last three sections, we considered recovering "low-complexity" signals from their linear (noisy or noise- 
free) projections. However, most applications feature signals that are not of exactly low-complexity but rather are 
"close" to low-complexity signals. In this section, we discuss this more general setting. Assume that the original 
signal x is not low-complexity but is close to the low-complexity signal x, i.e., ||x — x||2 < e n with e„ = o(l). 
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Again, let y Q = Ax a . Consider the following reconstruction algorithm for recovering x Q from its noisy linear 
measurements y Q : 

min ||y -Ax||| 

s.t. A-H»(x)</s m , w m. (9) 

Assume that A e M dxn and An are i.i.d. A/"(0, 1). Let x Q = x (y , A) denote the solution of ([9}. 

Theorem 5: Assume that there exists x Q € R™ such that ||x — x ||2 < e n> and K^ m (5c ) < K m ^ n m. Let 
y = Ax , where A is a d x n matrix with i.i.d. Af(0, 1) entries, and let x Q denote the minimizer of (|9j. Then, for 

any < r„ < 1, 

/„ , \/n/d + 1 + t + 1 / - — — Jn/d+ 1 +t \ 

P ||x - x 2 > i-L Vn2- 2 ™+ 2 + 2^J. e „ 

< 2 2Km '" m e5 (1_r " +21osr ' l) +e~5* 2 . (10) 



The proof is presented in Section 



VI-D 



There are two error terms in ( fT0] >. The first one, ^' n ^ d ^ 1+ ^l^/ n 2- 2m + 2 7 
is the reconstruction error due to the quantization performed in the calculation of Kolmogorov complexity. The 
second term is due to the fact that the signal x Q is not of exactly low-complexity. The following corollary simplifies 
the statement of the theorem for some special useful cases. 

Corollary 3: Assume x G = (x ,i,x ,2, ■ ■ ■) € [0,1]°° satisfies all the conditions of Theorem [5] with m n = 
[logn]. Let d = d n = [«;logn]. If d n — w(r7,e 2 ), then for any e > 

P(||x o -X?|||>e)^0 

as n — > oo. 

Proof We showed in the proof of Corollary [T] that by fixing t > and setting r„ = 0.04 

2 2«„ log n e '° g " (l-r 2 +2 log r) + g - ft 2 q 

We also showed that in the same setting we have 

^/n/d+l + t + l v -^^ _ Q 

It is straightforward to show that since d n — w(ne 2 ) we have 



yjn/d +l+t 



e*. — > 



and the result follows. □ 
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E. Other measurement matrices 

For the sake of clarity, the results presented so far have focused on i.i.d. Gaussian measurement matrices. However, 
the results can be extended to the more general class of i.i.d. subgaussian matrices. 

Definition 2: A random variable X is called subgaussian if and only if there exist two constants c\ , c 2 > such 
that 

P(|X| >t)< Cl e- C2 * 2 . 

Such a random variable is denoted by SG(ci, c 2 ). 

Our goal in this section is to show how our results can be extended to the problem of CS with i.i.d. subgaussian 
measurement matrices. Our main conclusion is that the results presented for Gaussian matrices continue to hold 
for subgaussian matrices except for slight changes in the constants. However, as will be discussed later in Section 



|VI-E| the proof techniques are different from those for Gaussian matrices. To show these differences we extend the 
result of Theorem [2] to subgaussian matrices. Similar arguments can be used for other extensions. As before we 
consider the problem of recovering x Q from linear measurements y Q = Ax Q , where the elements of the matrix are 
i.i.d. SG(ci,c 2 ). Also assume that E[Atj] = and E[A^] = 1. 

Theorem 6: Let x c = (x ,i, #o,2> • • •) G [0, 1] 00 . For integers m and n, let n m , n denote the Kolmogorov 
information dimension of x Q at resolution m. Then, there exist three constants c[, c 2 , and C3 depending only 
on ci and c 2 such that for any 1 — — < r„ < 1 and t > 0, 



P (j|x -x || 2 > (r- 1 ^ + l)n/d+l + l)Vn2- 2 ™+ 2 



< 2 ZKm -" m e ^ — +e~ c i". 

Theorem [6] shows that, by choosing m = [logn], 0(n mn \ogn) measurements remain sufficient for asymptoti- 
cally accurate recovery. But, as expected, the constants might be different from those in Theorem [2] 

F. Discussion 

The LLS algorithms proposed in |7]i and corresponding to the cases when noise is present either in the signal 
or in the measurements, both assume the knowledge of an upper bound on the complexity of the signal. While such 
knowledge might be available or estimated in some applications, in many cases it is not straightforward to acquire 
it. In those cases, one might change the formulation of the MCP as follows: 

argmin K^ m (x) 

X 

s.t. Px-y || 2 <z„. (11) 

We call this new algorithm relaxed MCP or R-MCP. In this new optimization problem the challenge is to set 
parameter z n properly. The value of this parameter should be set according to the noise level present in the system. 
For instance, if we employ z n = {\fn+ (t + l)y/d)e„ and z n = e for the approximately low-complexity signals case 
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(corresponding to Section III-D i and exactly sparse signal in the presence of deterministic noise (corresponding to 



Section III-C I, respectively, then we obtain results that are exactly the same as those stated in Theorems [4] and [5] 



Since the proofs are very similar to the proofs of Theorems [4] and [5] we skip them here. In the case of stochastic 



noise (corresponding to Section III-B i it is not clear if this new formulation provides a bound similar to Theorem 



[3] This problem is deferred for future research. 

IV. KOLMOGOROV DIMENSION OF CERTAIN CLASSES OF FUNCTIONS 

It is well known that the Kolmogorov complexity of a sequence is not computable ( [12| Section 14.7). However, 
it is often possible to provide upper bounds on the Kolmogorov complexity. In this section, we consider several 
standard classes of functions and provide upper bounds on their Kolmogorov information dimension. Based on 
these upper bounds, one can use Theorems [2] and [5] to calculate the number of linear measurements required by the 
MCP to recover them. These examples demonstrate the connection between the results of Section [ill] and the CS 
framework explained in the introduction. 

A. Sparse signals 

A class of signals that has played a starring role in compressed sensing is the class of fc-sparse signals. The 
following proposition provides an upper bound on the Kolmogorov information dimension of such signals. 
Proposition 2: Let the signal x Q = {x a ,i, £ ,2, ■ ■ • ,x ,n) be fc-sparse, i.e., ||x ||o < k. Then 

fc log* n + log* fc + c 

K m,nl, x oJ S * + 

m 

Proof: Consider the following program for describing [x Q ] m . First, use a program of constant length to describe the 
structure of the signal as "sparse" and the ordering of the rest of information, and the length of the sequence and the 
resolution^] Next, code the sparsity level fc with log* fc bits, and spend fc log* n more bits to code the locations of the 
fc non-zero elements. Finally, use km more bits to describe the quantized magnitudes of the non-zero coefficients. 
Therefore, we have 

m 

fc log* n + log* fc + c 

< fc H , 

m 

where c is a constant independent of x D , m and n. □ 
In most of our analysis in this paper we consider the case m — log n. It is straightforward to confirm that in that 
case, for any S > 0, k„ 1iTI (x ) < 2fc(l + 6), for sufficiently large n. 

5 Note that in calculating the information dimension we assume that n and m are given to the universal computer. Otherwise we would need 
log* n and log* m bits to describe them to the machine. 
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B. Power law compressible signals 

While sparse signals have played an important role in the theory of compressed sensing, it is well-known that 
they rarely occur in practice. More accurate models assume that either the signal's coefficients decay at a specified 
rate, or the signal belongs to an £ p ball with p < 1 [1 1, i.e., the signal belongs to the set 

^ = {x£l" : ||x|| p <l}. 

For x G g Bp, let (a; 0) (i), £0,(2)1 ■ ■ ■ i x o,(n)) denote the permuted version of x G such that |a; .(i)| > |as ,(2)l > ••• > 
|ac ( n )|. It is straightforward to show that |x 0i ft)| < £ - *, i.e., it is power law compressible. Therefore, if we just 
keep the k largest coefficients of this signal and set the rest to zero, the resulting fc-sparse vector x G satisfies: 



1 1 x Q — 5c Q || 2 < k~p 5 . In Section IV- A we derived an upper bound for the Kolmogorov information dimension of 
x D . Proposition [3] follows from this bound and Corollary [3] 

Proposition 3: Let x Q £ B^ . For n € IN, let y n = Ax™, where A is a d n x n random matrix with i.i.d. A/"(0, 1) 
entries. Set d n = |~n p / 2 log n] . Let denote the minimizer of Q with m = [logn] and n m n = 3n p / 2 . For any 
ei > and £2 > 0, 

P(\\x: - x" \\ 2 > e,) < e 2 , 

for sufficiently large n. 

Proof: Let i™ denote the fc-sparse approximation of x" derived by keeping the k = n p / 2 largest coefficients of 
x™, and setting the rest to zero. According to Proposition [2j for n large enough and any 6 > 0, the Kolmogorov 
information dimension of x™ at resolution m = logn is upper bounded by 2n^ (1 + 5). By setting 6 = 0.5 we 
obtain K m .„(i™) < 3n5. Therefore, according to Theorem|5] setting t = 1 and t = 0.04 yields 



\K - x: h > y^E±l±l±lv^^ + ^±±±± ( 



0, (12) 



as n — > 00, where e„ is the £2 norm of the difference between 5" and x". This error is upper bounded by 
ll^o — x o lb ^ ni~i. Plugging in the values of t, t, and e„ in ( [T2] > completes the proof. □ 
It is interesting to note that as the power p decreases, the number of measurements required for successful 
recovery decreases. 

C. Piecewise polynomial functions 

Let Poly^ denote the class of piecewise polynomial functions / : [0, 1] — > [0, 1] with at most Q singularitie^] 
and maximum degree of N. For / € Poly^, let (x 0i x, x 0y 2, ■ • ■ , x ,n) be the samples of / at 

1 n- 1 
0, . 

n n 

6 A singularity is a point at which the function is not infinitely differentiable. 
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Let {afj^fg denote the set of coefficients of the £ th polynomial of /, where Ng < N denotes its degree. For 
the notational simplicity, we assume that the coefficients of each polynomial belong to the [0, 1] interval and that 
Silfo ct| < 1, for every i. Define 

V = |x Q e R n | x 0>i = f(i/n), f e Po\y% } . 
Proposition 4: For every signal x D G V, we have 

fc m , n (x ) < (Q + .l)(iY-l 

+ 



(Q + i)(iv + i)riog 2 (jv + i)l 

m 

log* n + log* N + log* k + Q log* n + c\ + c 2 



m 

Proof: Consider the following program for describing the quantized version ofx Q . The code first specifies the signal 
model as samples of a "piecewise polynomial" function with parameters (n, Q, N). This requires log* iV+log* Q+c 
bits. Then, for each singularity point, the code first specifies the largest sampling point i/n that is smaller than it. 
Since there are at most Q singularity points, describing this information requires at most Q log* n bits. The next 
step is to describe the coefficients of each polynomial. Using an to' -bit uniform quantizer for each coefficient, the 
induced error is bounded as 



i=0 i=0 



< 



Him' 



< (Ni + l)2" m < (N+ l)2" m . (13) 



To ensure reconstructing the samples at resolution m, we require (N + l)2~ m < 2~ m . Therefore, to describe the 
coefficients of the polynomials, at most, (Q + 1)(N + 1) (to + [log 2 (N + 1)] ) extra bits are required. Hence, overall, 
it follows that 

K^(x ^x 0i2 ,...,x , n ) + + (Q + l)(iV + l)riog 2 (JV + l)l 

TO " TO 

+ log* N + log* Q + Q log* n + c 

TO 

□ 

It is straightforward to plug ([14} in Corollary [T] and prove that, for large values of n, 0((Q + 1)(N + 2) logn) 
measurements are sufficient for the successful recovery of the piecewise polynomial functions. 



D. Smooth functions 

Suppose that xt, x 2 , ■ ■ ■ , x n are equispaced samples of a smooth function / : [0, 1] — > [0, 1]. Let S 13 represent 
the class of j3 + 1 times differentiable functions. For the notational simplicity we assume that |/ (m) 0)| < ml for 
every to < f3 + 1. This function is not necessarily a low-complexity signal, but it can be well-approximated by a 
piecewise polynomial function. To show this, consider partitioning the [0, 1] interval into subintervals of size r n , 
and approximating the function / with a polynomial of degree j3 in each subinterval. Let fp (x) denote the resulting 
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Fig. 1. The representation of smooth function (solid black curve) and its piecewise polynomial approximation (dashed red). As r n becomes 
smaller the approximation become more accurate. 



piecewise polynomial function. It is straightforward to prove that ||/ — fa\\oo < r n +1 - Hence, if x and x denote 
the vectors consisting of the equispaced samples of the original signal and its piecewise polynomial approximation, 
respectively, it follows that ||x — x || 2 < y/nr^ +1 . We can summarize our discussion in the following proposition. 

Proposition 5: For n € IN, let x™ denote the vector of n equispaced samples of / e S@ . Let y d = ^4x™, 

where A is a d n x n random matrix with i.i.d. A/"(0, 1) entries. Also, let x™ denote the solution of low-complexity 

2 

least square algorithm in &), with m = logn and ft m , n = 2(2 + /3)(n 2 ' 3 + 3 + 1). Then, for n large enough and 
d n = \ K m,n^ogn\ and any £1,62 > 0, we have 

P(||x; l -x»|| 2 > ei )<e 2 . 

Proof: Partition the [0, 1] interval into subintervals of size r n = n~ £+ 3 / 2 , and approximate the function / with a 
polynomial of degree (5 in each subinterval. Let fp denote the resulting piecewise polynomial function. According 
to Proposition [4] for n sufficiently large, the Kolmogorov information dimension of the samples of fp, x™, at 
resolution m = logn, is less than (n' 3 + 3 / 2 + + 2)(1 + 5), for any 5 > 0. Set 6 = 1 and assume that n is large 
enough for this result to hold. By Theorem [5] if we set t = 1, r = 0.04, and d n = [K m) „logn], then it follows 
that 



0. 



as n — > 00, where e„ is the £2 of the difference between x" and x™. Furthermore, as described before, e„ = 
||x™ - x"|| 2 < Vnr% +1 = n _ 5 + 23T3. Plugging in t = 1, r = 0.04, and e„ = n ~^ + ^+^ completes the proof. □ 

E. Low-rank matrices 

Let C r (M, N) be the class of M x N real-valued rank-r matrices X with <r max (X) < 1. The following theorem 
characterizes the Kolmogorov information dimension of a matrix in this class at resolution rn. 
Proposition 6: Let X e C T (M,N). Then 

*»,»(*) < r(M + N + l)+ l°g-r + r-(Af + *+l)log(3r)-r + C 
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Proof: Having access to the values of M, N and the resolution level m, consider the program that describes X 
through its singular value decomposition as follows. Denote the singular value decomposition of the matrix X as 
X = UZV T where U £ U Mxr , V G R Nxr and E e R rxr is a diagonal matrix. Note that U T U = I r and 
V T V = I r . To describe X, first we use a constant number of bits to describe the structure of the data as a matrix 
of rank r, and also our coding strategy, which is describing the quantized versions of U, E, and V. To describe 
the rank r, the code uses log* r bits. The next step is to describe the quantized versions of U, E and V. Let 
m u , m v , and ra a denote the resolution levels used in the uniform quantization of the elements of U, V, and E, 
respectively. Hence, the quantized matrices can be described using rMm u + rNm v + rm a bits. Let U, V and 
E denote the quantized version of U, V and E at the specified resolutions, respectively. Let X = UT,V. By the 
triangle inequality, 

\X ij -X ij \ = \u[Xv j -uJ£v j \ 

< \u[ Sv 3 - - uf Ev,| + |uf E Vj - ufE Vj | + |uf E Vj - ufEv^, (15) 

where uf ,vf ,uf, and vf denote the i th rows of U, V, U and V, respectively. Note that \Uij\ < 1, \Vij\ < 1, for 
all i, j. Also by assumption, cr max (E) < 1, and therefore < Ejj < 1, for i = 1, . . . , r. Moreover, \Uij — Uij\ < 
2 -m u +i ) \y.. _ y..\ < 2 -m t ,+i ) and finally _ < 2 -"v. Therefore, 

' Xij\ < |uf Ev, - uf £ Vj | + |uf Ev, - uf E^l + |uf Ev, - uf £v,| 

< ||uj - uJ 2 ||Evj|| 2 + ||uJ 2 ||(E - E)vj|| 2 + ||uJ 2 ||E(v, - ) 1 1 2 

< ||Ui - Ui||20- max (E)||Vj||2 + ||Ui|| 2 (T max (E - £)||Vj|| 2 + ||Ui|| 2 cr max (E)||(Vj - v,)|| 2 

< r2 -m„ + l +r2 -m„ +r2 -m„+l^ 

To ensure reconstructing the samples at resolution m, we have 

r2 -m„ + l + r2 -m<, + r2 -m„+l < 2 -m+l^ 

Setting m„ = m v = m a + 1, we obtain m a > m + log(3r) — 1. Therefore, the Kolmogorov information dimension 
at resolution m of X is upper bounded as follows: 

log* r + rMm u + rNm v + rm a + c 

K m,M.N ^ 

m 

^ log* r + rM (m + log(3r)) + rN(m + log(3r)) + r(m + log(3r) - 1)) 

m 

< r(M , at , 1} i logV + r(M + iV + l)log(3r)-r + ^ 

m 

□ 
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V. Related work 

A. Kolmogorov complexity and applications 

This paper is inspired by [17| and fl3] . flT) considers the well-studied problem of estimating £ R" from its 
noisy observation s = + z, where z represents the noise in the system. It suggests using the minimum Kolmogorov 
complexity estimator (MKCE), and proves that if {Oi}f =1 1 ~ ' n, under several scenarios for the signal and noise, 
the average marginal distribution of the estimate derived by MKCE tends to the actual posterior distribution. 
fl3| considers the problem of CS over real-valued sequences with finite Kolmogorov complexity and defines the 
Kolmogorov complexity of a real- valued sequence x = (xj, . . . , x n ) as the length of the program that prints the 
binary representation of x and halts. Consider the set of all real-valued sequences with Kolmogorov complexity 
less than or equal to feo> i- e -> 

S(k ) = {x : K(x) < k }. 

Let A denote a d x n binary matrix, x D = (xi,X2, . . . , x n ) T , y D = Ax„. |l3) proposes the following algorithm for 
recovering x Q from its linear measurements y Q : 

x(y ,A) = argrriinlf(x). (17) 

It proves that 2k random linear measurements are sufficient for recovering sequences in <S(fco) with high probability. 
This result does not consider any non-ideality in the signal or the measurements. Our paper settles both issues. 
Furthermore, note that S(ko) covers none of the classes of signals of interest in compressed sensing, such as sparse 
vectors or low -rank matrices. Almost all such signals have infinite Kolmogorov complexity, and therefore are not 
covered by the framework proposed in [13]. Our generalizations require completely different proof techniques. 

In an independent work, |15] and Jl6) have explored the performance of the MCP algorithm for CS problems. 
Replacing the Kolmogorov complexity with the empirical entropy, they propose a Markov chain Monte Carlo 
approach similar to p8|-||20| to solve the recovery problem. The empirical results provided in fl6) are very 
promising. Our theoretical results explain why such algorithms perform well in practice. 

Finally, we should mention that Kolmogorov complexity has proved to be useful in other applications such as 



similarity detection [21 1, [22], density estimation |23| and compression and denoising ]24) . For more information 
on the progress in these areas, see fTT[ . 



B. Stochastic models 

In this paper, we considered deterministic models for the signals. While deterministic signal models are the 
most popular models in CS, stochastic models have been also extensively explored in this field. See [25|-[35| and 
the references therein for more information. The most relevant to our work is [32|. It considers the problem of 
recovering a memoryless process from a linear set of measurements and proves a connection between the number 
of measurements required and the Renyi information dimension. The upper information dimension of a random 
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vector (X 1 ,X 2 , ■ ■ ■ , X n ) is defined as 

l tv v \ A H([Xi] m , . . . ,[X n ] m ) 

d{Xx, , . . , X n ) = lira sup . 

m— ►oo m 

There is a connection between the Kolmogorov information dimension of a sequence and its Renyi information 
dimension [12| (Theorem 14.3.1). In spite of such connections, there are several important differences between our 
work and the work of (32) . First, the results in [32] are asymptotic, and the amount of error and the probability 
of correct recovery for finite dimensional signals have not been established there. Second, the stochastic approach 
proposed in (32) considers a specific distribution that is assumed to be known in the recovery process while we 
are considering universal schemes in this paper. 

C. Universal schemes and minimum entropy coder 

Our work has some connections with the minimum entropy decoder proposed by Csiszar in [36) . He suggests 
a universal minimum entropy decoder for reconstructing an i.i.d. signal from its linear measurements at a rate 
determined by the entropy of the source. For more information, see p7| , (38| and the references therein. 

Finally we should emphasize that universal algorithms (that perform "optimally" without knowing the distribution 
of the data) have been explored extensively in information theory and are popular in many applications, including 
compression (19) , (39), denoising ]40| , prediction pT) , and many more. However, to the best of our knowledge 
our results provide the first universal approach for CS. 

D. Signal models 

As mentioned in the introduction, in this paper we address a central problem in the field of compressed sensing. 
Since the early days of CS, there have been many efforts to push the limits of the technique beyond sparsity. This 
line of work has resulted in a series of papers each of which either generalizes the signal model or reduces the 
required number of measurements by introducing more structure on the signal; see, for example, [42|-[48). As 



proved in Section IV some of these models can be considered as subclasses of the general model we consider here. 
However, it is worth noting that even though the MCP algorithm proposed here is universal, since Kolmogorov 
complexity is not computable, it is not immediately useful for practical purposes. 

VI. Proofs of the main results 

A. Useful lemmas 

The following lemmas are frequently used in our proofs. 

Lemma 2 (x 2 concentration): Fix r > 0, and let Z\ ~ jV(0, 1), i = 1, 2, . . . , d. Then, 



P (j2 Z i < d ( l - < e' (r+log(1 - T)) 
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and 



P Z 1 > + r)j < e-^-^s(i+r))_ (18) 
Proof: By the Markov inequality, for any A > 0, we have 

<e- AdT E[e A ( d -^ z ")" 
= e -xdr+xd (E^-^y 

= C - xdT+xd {l + 2\y d/2 . (19) 



We optimize over A to obtain 



(20) 



2(1 -r) 

Plugging (f20) into (JT9), we obtain ([18). □ 
Lemma 3: Let X and Y denote two independent Gaussian vectors of length n with i.i.d. elements. Further, 
assume that for i = 1, . . . , n, X t ~ W(0, 1) and Yj — W(0, 1). Then the distribution of X T Y = Y%=\ x i. Y i. is me 
same as the distribution of ||X||2G, where G ~ Af(0, 1) is independent of ||X||2. 
Proof: Note that 

vTv " v 



Given X/||X|| 2 = a, 



|X|| a f~[ ||^||2 



because ||a||| = 1. Therefore, since the distribution of X T Y/||X||2 given X/||X||2 = a is independent of the value 
of a, the unconditional distribution of X T Y/||X||2 is also 7V(0, 1). To prove independence, note that X/||X|| 2 and 
Y are both independent of ||X|| 2 . □ 

The following lemma is adapted from [49 1 (Proposition 5.10). 

Lemma 4: Let Z\, Z 2 , . . . , Z n be i.i.d. zero-mean SG(ci, C2) random variables. Let a = (ai, 0.2, . . . , a n ) G R" 
be a vector satisfying 1 1 a 1 1 ^ = 1- Then 

p^Y^^z, >tj < Cie - c2 * 2 . 

In other words Yn=i aiZi i s a l so SG(ci,c 2 ). 

Definition 3: A random variable X is called subexponential, denoted by SE(ci,C2), if and only if 

P(|AT| >t)< Cl e~ C2t . 
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Slightly modified versions of the proofs we provide in the rest of this section can be found in |49|. For the sake 
of clarity and uniformity we state these lemmas with their proofs here. 

Lemma 5: Let Z be a SE(ci,C2) random variable. Then, it follows that 

2c lP l 



mn < 



4 



Proof: Here we prove this lemma for the case where p is even. The other case follows the same approach. Let 
F(z) denote the cumulative distribution function of the random variable Z 

/>oo />0 . . />oo />oo />0 rz 

E[|Z| P ] = / z p dF{z) + / z p dF(z) ( = } / pzP- 1 / dF{x)dz - / pz^ 1 / dF(x)dz 

J J — oo J J z J —oo J — co 

<[ pz^de'^dz- [ pz^de^dz^ 201 ^ 



4 



Equality (a) is the result of integration by parts. 

Lemma 6: Let Z be a zero-mean SE(ci,C2) random variable. Then we have 

E [e xz ] < e 4ciA2 / c ^, VA<c 2 /2. 



□ 



Proof: We prove this theorem by expanding the exponential function c xz and bounding the moments using 
Lemma [5] as follows: 



E [e AZ ] = E 



l + * + E 



< 1 + 2ci 



Assuming that — < 1 we obtain 

° c 2 2 



fc=2 
C2 



K\ 



fc=2 



< 1 + 2ci 



f'2 



1 



1 - A/c 2 



(22) 



E|e AZ ] < l + 4ci ( ^) - «•'" ' ^ 



□ 



where the last inequality is due to the fact that 1 + x < e x for x > 

Lemma 7: Let Z\, ■ ■ ■ , Z n be i.i.d. SG(ci, C2) random variables with mean zero and variance 1. Then we 
have 



E(^ 2 - 1) 



>nt) < 2 e -" c i t2 / 16c3 , for t € (0, — ), 
/ c 2 



where c 3 = max(e C2 ,Cie C2 ). 

Proof: Define Xj = Zf — 1. It is straightforward to confirm that for all < > 1, 

P(|Xi| > t) < Cl e~ C2( - t+1 l 



(23) 
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Define c 3 = max(e C2 , Cie C2 ). If we combine the fact that P(|X,-| > t) < 1 for < t < 1 with |23|, we obtain 

P(|JQ| > t) < c 3 e- C2t . 

We have 

P (J2*i > nt\ = P (e A ^. x - > e Xnt ) < e~ Xnt (E [e AXl ])" < e '^+^ 2 /4^ (24) 
where the last inequality is the result of Lemma |6] Assuming t < ^ and setting A = ^3/(803), we obtain 

P 2^ Xi > nt\ < e 16 <=3 . 

Using the same argument we find a similar upper bound for PQ^iLi -^Q < — n ^)- n 

Lemma 8: Let 4 be a d x n matrix with i.i.d. SG(ci,C2) elements, and suppose that the elements satisfy 

E(Ajj) = and E(A^) = 1. Then there exist two constants Ci,c 2 depending only on c\ and c 2 such that with 

— ' ? 2 

probability at least 1 — e C2 , 

Proof: See Theorem 5.39 in [49] for more information on the proof and the constants that are involved. □ 
B. Proof of Theorem [2] 

Let x Q denote the solution of MCP. Also, let q m = x G — </> m (x G ) and q m = x G — </> m (x ) denote the quantization 
errors of the original and the reconstructed signals at resolution m, respectively, where for x £ [0, 1]™, </> m (x) is 
defined in Remark Q] 

Since both Ax a = y a and Ax Q = y Q , it follows that 

A((j) m (x ) + q m ) = A(4> m (± ) + q m ) 

and 

A(cj) m (x ) - m (x o )) = A(q m - q m ). (25) 
On the other hand, by our definition in ([T}, it follows that 

||q m -q m ||^<«2- 2m + 2 . 

Hence, 

\\A((j) m (x ) - OT (x o ))||a = \\A(q m - q m )|| 2 

< a max (A)Vn2-2-+2, (26) 
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where <7 max (A) is the maximum singular value of matrix A. By definition, K^'' m (x ) < n m n m, and since x G is 
the solution of Q, we have 

tfH™ (x ) < if ['I'" (x ) < K m , n m. (27) 

Define the set S as 

5 Q = {h : h = m (x o ) - m (x o ), x ,x e [0, 1]", K(cf) m (x )) < K m , n m, if(0 m (x o )) < « m ,„m} . 
Define the event £^ as 

?(n) a 



£[ n) ±$ he 5; ||A(h)|| 2 <rVd||h|| 2 }, (28) 



and, for i > 0, the event as 



'(") A 



- Vd - Vi < tVS} . (29) 
Let e = (T _1 (\/nd _1 + 1 + 1) + 1)V n2 _2m + 2 . Using these definitions and the union bound, we have 



P (||x - x || 2 > e) = P (||x - x || 2 > e, £ t (n) n 

+ P(j|x -x || 2 >e,(£ 1 (n) n^" ) ) c 
<p(|| Xo -x || 2 >e,£ 1 ( " ) n^" ) 



P((^ n) n4 n) )' 



<P(||x -x || 2 >e,£ 1 ( " ) n^" ) 



P(^ (n) ' c )+pf4 n) ' c V (30) 



But if A € S[ n) n then it follows that 



||x c - x c || 2 = ||0 m (x o ) + q m - </> m (x ) - q m || 2 

< ||</> m (x ) - m (x o )|| 2 + ||q m - q m || 2 

(a) _ 

< (rVcf) ||A(0 m (x o ) - m (x o ))|| 2 + ||q m - q m || 2 

< (T%/d)-V max (yl)\/n2- 2m + 2 + Vn2- 2m + 2 



< ((r\/d)-V max (A) + l)Vn2-2™+2 

< (r^ 1 (VncF 1 + l + t) + l) \/n2- 2m + 2 . (31) 
Inequality (a) holds due to the assumption that A £ £■[ , and therefore ||A(</) m (x ) — m (x o ))|| 2 > T-\/d||(0m(x o ) — 
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^m(x ))||2- Inequality (b) is the result of ( p6) , and Inequality (c) is due to the fact that A <E • Hence, 

p(||x -x || 2 > e ,^" ) n4" ) ) =0. 

On the other hand, by Lemma [2] for any sequence x £ R n , 



(32) 



P ( \\Ax\\ 2 < Wd\\x\\ 3 ) =P 



A, 



X 2 



< e f (l-T 2 +21ogr) ; 

where, for i = 1, . . . , d, Zi = HxH^" 1 2~Zj=i -^i,j x j- Using the union bound, it follows that 

P f^" ),cN ) < 2 2Km >™ m e^ (1 ~ T2+2k,gT) . 



(33) 

Finally, using the results on the concentration of Lipschitz functions of a Gaussian random vector (50), we obtain 

P (4 n) ' c ) = P (a max (A) -Vd-V^>tVd ) ) 

< e - dt2/2 . (34) 

Plugging (32) , ( [33) , and ( |34) into ((30) completes the proof. □ 

C. Proof of Theorem [3] 

Let x denote the solution of 

argmin ||Ax-y || 2 , 

s.t. K [ ' ]m (x) < k n m n . (35) 

Since, by the assumption of the theorem, K^ m (x a ) < k m ^ n m n , x D is also a feasible point in ( |35) . Therefore, 

\\AZ -y \\ 2 2 < ||Ax -y || 2 

= ||i4x -Ax -w||l = ||w||i. (36) 

Expanding ||Ax - y ||| = ll^ x o — Ax a — w||| in ( (36) , it follows that 

(37) 



\A(Z - x )|| 2 + ||w|| 2 - 2w T A(x - x ) < ||w| 
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Cancelling ||w|| 2 from both sides of ( [37] >, we obtain 

\\A(± - x )||| < 2w T A(x - x ) 

< 2|w T A(x -x )| . 

Let q m = x Q — m (x o ) and q m = x G — m (x o ) denote the quantization errors of the original and the reconstructed 
signals, respectively. On the one hand using these definitions and the Cauchy-Schwartz inequality, we find a lower 
bound for ||A(x — x )||| as 

P(x -x )||2 

= \\A((f) m (x ) + q m - m (x o ) - q m )||| 
= \\A(<f> m (± ) - m (x o )) + A(q m - q m )\\l 

(Xo))||^ 

- 2 |(q m - q m ) T A T A(0 m (x o ) - </> m x D ))| 

> P(0m(x o ) - 0m(x o ))||! 

- 2 ||A(q m - q m )|| 2 ||A(0 m (x o ) - m (x o ))|| 2 . (38) 

On the other hand, again using our definitions plus the Cauchy-Schwartz inequality, we find an upper bound on 

|w T A(x D - x Q )| as 

|w T yl(x - x D )| 

= |(0m(x o ) - m (x ) + q m - q m ) T A T w| 

< |(0 m (x o ) - <t) m (x )) T A T w\ + |(q m - q m ) T A T w| 

< |(0 m (x o ) - m (x o )) T A T w| + ||q m - q m || 2 ||A T w|| 2 . (39) 
For any z G [0, 1], < z - [z] m < 2~ m . Therefore, 

||qm-qm||2< Vn2~ 2 ™+ 2 . (40) 
Let A m = <j) m (x ) — (j) m (x ). Define the set S as follows 

S = {h : h = </> m (x ) - m (x o ), x ,x G [0, 1]", K((f> m (± )) < K m , n m, if(0 m (x o )) < K m ,„m} . 
Note that |5| < 2 2K ™ " m . For t x > 0, define the event £ 1 as 

f} n) 4{VhG5:||w T .41i|| 2 <ii||h|| 2 }. 
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For any h £ R™, Ah is a vector of length d with i.i.d. entries distributed as Af(0, ||h|||). Assuming that ||h|| 2 = 1 
and applying Lemma [3] it follows that 

P (|w T Ah| > ti) = P (||w|| 2 G > h) 

= P (||w|| 2 G > t u ||w|| 2 > Vda{l + t 2 )) 
+ P (||w|| 2 G > t 1} ||w|| 2 < Vda(l + t 2 )) 

< P (||w|| 2 > \fda(l + t 2 )) 
+ p(G>ti(%/dCT(l + t 2 ))- 1 

2 tf 

< e - dt 2/ 2 _|_ g 2^d(l + t 2 ) _ (4^-) 

Hence, by the union bound and the fact that |5| < 2 2Km " m , we obtain 

P(£ i ( " ),c ) < 2 2Km -" m ^ dt ^ /2 + e ~ 2 ^<w2) j . (42) 

Note that 

11-4(3™ - q m )|| 2 < o- max (A)|| 

Qrri Qm 1 1 2 ■ 

For £3 > 0, define event £2 as 

4 n) = {^max{A) < (1 + i 3 )Vd + V^} ■ 

It can be proved that fl50) 

P < e-*'/ 2 . (43) 

But if a max (A) < (l + t 3 )\fd + ^/n, then from |40| 



P(q™ ~ q m )|| 2 < ( 1 + (1 + t 8 )y ^ j 2-" i+1 n. 

For < £4 < 1, define the event £3 as 

^ n) ^{Vhe5:Ph|| 2 >(l-t 4 )d||h|| 2 }, 
By the union bound and Lemma [2] (similar to the proof of ( |33| l in the proof of Theorem |2j, it follows that 

P (^3™ ) ' C ) < 2 2Km '" m e3 (t4+log(1 ~ t4)) . (44) 

For £5 > 0, define the event E\ as 

fi n) 4{Vhe5;||^h|||<(l+t 5 )d||h||i}. 
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Again by the union bound and Lemma |2j it follows that 

P U^A < 2 2Km '" m e-5 (t5 ^ los(1 - t5)) . (45) 

Finally, for t$ > 0, define 

4 n) ^{\\A T w\\l<nd(l + t e )}. 
Given w, A T w is an n-dimensional i.i.d. Gaussian random vector with variance ||w|||. Hence, by Lemma [2] 

P(P T w||2>n 7 2 (l + t 7 )|||w||2= 7 2 ) 

< g-f («7-log(l + t 7 ))_ 

On the other hand, again by Lemma [2] 

P(||W|| 2 < d(l - t 8 )) < e f(*8+log(l-ts))_ 

Choosing t 6 ,t 7 ,t% > such that t 6 < £7 and 1 + 1@ = (1 — tg)(l + tf), it follows that 

P(||A T w|| 2 > nd(l + t 6 )) 

= P (||A T w|| 2 > nd(l + 1 6 ), ||w||l > d(l - t B )) 
+ P (||A T w|| 2 > nd(l + t 6 ), ||w|| 2 < d(l - i 8 )) 

< g-f (*7-log(l + t 7 )) _|_ e #(t 8 +log(l-t 8 ))_ ^ 



Combining ( |38| ) and ( |39| ) and the upper and lower bounds derived on the corresponding terms of ( |38) and ( |39| l and 
choosing t\ = lo^J d(\ + t2)2n m _ n m, with probability P(£i n £2 H £3 n £4 n £5), the following inequality holds: 

(l-U)Vd\\A m \\ 2 2 

< 2 (yTT^2- m + V^((l + t 3 )Vd + Vn))) || A m || 2 

+ 2 (<7Vl + *2A/2K m ,nm + c) || A m || 2 

+ 2- m+1 VTTten. (47) 

Inequality |47) involves a quadratic equation in ||A m ||2. Finding the roots of this quadratic equation, and using 
\fl + x < X + x/2, we obtain 



||A m ||2 G^fZX (2/Cm,n 

m)d 1 

+ 2- m+1 ( 7l ^ + 72 n%/d rT ) 

+ n2""Vd ZI 74 , (48) 
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where 7l = y/l+T b {l + t 3 )(l - t 4 ) _1 , 72 = \/TT^(l - t 4 )-\ 7s = vTT£(l - *4) -1 and 74 = vTTfc(l - 
ti) -1 . Note that the terms a^^J (2n m ^ n m)d~ 1 and 2~ m+1 (ji^/n + 72nvd ) + n2~ m Vd -1 74 are the variance 
and bias of the estimator given in ( |35) , respectively. Suppose that d is fixed and K m , n is independent of m. Then 
as m increases the bias of the estimator decreases since we are applying a very fine quantization to the signal, but 
the variance increases. This is the well-known bias-variance trade-off. By the union bound, 

p ((^ n) n 4 n} n 4 n) n 4 n) n £<">) c ) 
= p(£< n) « c u 4" ),c u 5 3 (n),c u £ { 4 n) ' c u 4" 3,c ) 

< P(f} n) ' c ) + P(£< n) ' c ) + P(^ n) ' c ) + P(^ n) - C ) + P(£< n >' c ). (49) 

Setting m = [logra] and d = 8rK m n m as proposed in Theorem [3] it is straightforward to confirm that as n grows 
to infinity, for all I G {1, 2, 3, 4, 5}, 

P(^ (n) ' c ) 0. 

Furthermore, except for 1773-1/ ' {2n m _ n m)d^ 1 , as 71 grows to infinity, all the terms in ( |48] > also converge to zero. 
This completes the proof of Theorem [3] □ 

D. Proof of Theorem [5] 

Since the proof of this theorem is similar to the proof of Theorem [2] we skip most of the steps and only emphasize 
the main differences. Let x Q denote the solution of (|9). Define q m , q m , and q m as the quantization error vectors 
of x , x , and x D , respectively, i.e., q.,„ = x G - m (x o ), q m = x - m (x o ), and q m = x D - <j) m {x ). Since 
||Ax - y || 2 = \\A(x - x )|| 2 < <J max (A)e n and x Q is the minimizer of it follows that ||Ax D - y Q || < 
&max (A)e n . Therefore, 

\\Ax - Ax \\ 2 = \\A5t - y G - (Ax - y )|| 2 

< 2a max (v4)e„. (50) 

Again, by the triangle inequality, 

\\Ax Q - ^Xo^ 

= \\A((f) m (x ) + q m ) - A((j) m (x ) + q m )|| 2 

> \\A((/> m (x ) - (f> m (x ))\\ 2 - \\A([q\ m - [q\ m )h 

> \\A(cf> m (x ) - (75 m (x ))|| 2 - cr max (A)\\[q\ m - [q] m || 2 

> \\A((b m (x ) - 4>m{*o))h ~ a max {A)^n2-^+K (51) 
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Combining ( f50] l and ( fBl) , it follows that 

|| A(0 m (x o ) - <j>m(xo))h < <Jmax(A)Vn2- 2m + 1 + 2CT max (A)e„. (52) 

We also have: K^ m (x ) < mK m ^ n and K^'' m (5c ) < mn m ^ n . 

Define the events and as done in ( |2"8j l and ( |29] i in the proof of Theorem [2] Then, applying the argument 
used there, it follows that 

P (||x - x || 2 > e) < P (||x - x || 2 > e,£ i 1 n) n 



+ P(^ n) ' c ) +P(^ n) ' c ). (53) 
The rest of the proof is exactly the same as that for Theorem [2] □ 

E. Proof of Theorem [6] 

Let x„ be the solution of the MCP algorithm and q m = x G — m (x o ) and q m = x G — m (x o ) denote the 
quantization errors of the original and the reconstructed signals at resolution m, respectively. Following exactly the 
same steps as the proof of Theorem [2] we obtain 

ffH™ (x ) < A-H™ (x ) < K m<n m (54) 

and 



P(<M*o) - m (x o )) = ( r max (A)Vn2- 2 ™+ 2 . (55) 
Since we are dealing with subgaussian random matrices, we define slightly different events here. Let the set S as 

S = {h : h = m (Xo) - (f>m(Xo), x ,x G [0, 1]", if(0 m (x o )) < K m: „m, K(<f> m (x )) < K m , n m} , 
and define 

£ x (n) 4 {J hG<S ;|L4(h)|| 2 <iVd||h|| 2 }, (56) 



c 1 



= {a max (A)<Vd+(^ 2 + l)Vn}, (57) 
where c' 2 is the constant introduced in Lemma [8] P(||x — x D || > e) can be upper bounded by 
P(||x -x || >e) <P(||x -x || 2 >e,^ n) n^" ) )+P(^" ) ) + P(4 n) ). 
If A G £[ n) n ^ n) , then similar to @T) we can prove 



x -x || 2 < r" 1 + ljnd" 1 + 1 +1 Vn2- 2m + 2 
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Hence, 

p(||x o -x o || 2>e ,f 1 (n) n^" ) ) = 0. 

Also, according to Lemma[i] V{£^) < c~ Cl ". Therefore, the main difference is in the calculation of P(£[ n ^): 

P (||Ax|| 2 < r\/d||x|| 2 ) =P 

= p(X>?<T 2 d), 

where for i = 1, 2, . . . , d, = Hxjl^ 1 Y^j AijXj. Therefore, by LemmaQwe obtain 




P(\Zi\ >t)< Cl e- C2t . 



According to lemma [7] we have 



P $> 2 <r 2 d < e 



<ic|(x 2 -l) 2 



\i=l / 

where C3 = max(cie _C2 ,e Cl ) and 1 — t 2 < C3/C2. Finally, the union bound proves that 

I \ dc 2 (T 2 -l) 2 

P(£{ Il) ) < 2 K -'" m e r^^^, 
which completes the proof. □ 

VII. Conclusions 

In this paper, we have considered the problem of recovering structured signals from their linear measurements. 
We have used the Komogorov complexity of the quantized signal as a universal measure of complexity to both cover 
many of the examples explored in the CS literature and also provide a framework to analyze future structured signal 
models. We have shown that, if we consider low-complexity signals, then the minimum complexity pursuit (MCP) 
scheme inspired by Occam's razor recovers the simplest solution of a set of random linear measurements. In fact, 
we have proved that the number of measurements required is proportional to the complexity and logarithmically 
to the ambient dimension of the signal. We have also considered more practical scenarios where the signal is not 
exactly low complexity but rather is "close" to a low complexity signal. We have shown that, even in such cases, 
the MCP algorithm provides a good estimate of the signal from much fewer samples than the ambient dimension 
of the signal. 

As mentioned above, Kolmogorov complexity of a sequence is not computable. However, currently we are 
working on deriving implementable schemes by replacing the Kolmogorov complexity by computable measures 
such as minimum description length pT|. 
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Appendix 

A. Review of Koltnogorov Complexity 

1) Universal machine: In an effort to formalize the concept of computability of functions, Turing introduced 
the notion of Turing machine. A Turing machine is a device that has a finite number of states and a memory that 
is in the form of a tape. The tape consists of small blocks, each of which can store one of the three symbols 
I = {0, 1, B} where B represents a blank. The Turing machine also has a head that at each point in time points to 
one of the blocks on the tape. The machine works in discrete time steps. At every time instance, it reads one symbol 
from the tape (at the location the head is pointing to), and based on its current state and the acquired information 
from the tape, it performs the following actions: 

1) Moves either to a new state or remains at the current state. 

2) Writes one symbol from I onto the tape at the location of the head. 

3) Moves the head one block either left or right. 

The process continues until the machine enters the halting state. The output of a Turing machine M given a finite 
bit string s is defined as follows. Write the string s on the input tape of M and assume that except for these bits all 
other blocks are occupied by symbol B. Set the head of machine to the left-most block of s and run the machine 
M until it halts. If the machine does not halt on s, then M (s) is not defined. If M halts, then the tape contains a 
binary string that is surrounded by blanks. This binary string is defined as M(s), i.e., the output of M given s. If 
the output string contains blanks between the binary symbols, then they are replaced by zeros to make the output 
a binary sequence. 

A partial function <f> : {0, 1}* — > {0, 1}* U {0, 1}°° is called computable if there exists a Turing machine M such 
that for every s s {0, 1}* that M(s) is defined, <fi(s) — M(s). According to Church-Turing thesis (conjecture), if a 
function is computable in any intuitive sense, then it is computable in the above sense (52]. This thesis simplifies 
the general problem of computability to the computability on the Turing machine. 

One of the most fundamental results in the algorithmic information theory is the existence of a universal Turing 
machine. A universal Turing machine U is a machine that is able to imitate the behavior of any other Turing 
machine M on any input string. The existence of Universal Turing machines is a result of the fact that any Turing 
machine can be uniquely specified with a finite number of bits. We call these bits the simulator program and for 
a Truing Machine (M) denote this bit stream by Pm- See Chapter 1 of fTT| , |53| for more information on the 
universal Turing Machines. 

2) Kolmogorov complexity: There exist several notions of Kolmogorov complexity. In this paper, we use the one 
based on prefix recursive functions. 

Let <j> be a partial recursive (partial computable) function. Define the complexity of string s for function (f> as 

Kf(s)± min t(p), (58) 

pe{O,l}*:0(p)=s 

where £(p) denotes the length of the binary string p. As mentioned earlier, each computable function corresponds 
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to a Turing machine. Therefore, p = _1 (s) provides a uniquely decodable code for s. 

The notion of complexity defined in ( |58| ) depends on machine <f>. However, the invariance theorem unifies all 
these definitions. 

Theorem 7: There exists a partial recursive function U such that for any given partial recursive function <fi, we 
have 

Ku(s) < K^(a) + c^, 

for all s e {0,1}*. 

One of the main issues with this definition of Kolmogorov complexity is that the programs that generate different 
outputs may be prefixes of one another. For instance, suppose that our goal is to print two strings Si and S2 with 
this machine. Let pi and P2 denote the programs that generate Si and S2, respectively. We expect that the following 
strategy results in a code that generates S1S2: i) Use a constant number of bits (independent of Si or S2) to describe 
the goal of generating the concatenation of two strings, ii) Provide pi and P2. However, this program does not 
print (si, S2). In fact, since either pi or P2 could be the prefix of the other one, the machine can not decide where 
to stop and print the first number. Therefore, the machine also requires the length of the first program to be able to 
determine where to stop. This means that Ky (81,82) < Ku(si) + Kjj(s 2 ) + lag*(Ku(sx)) + c. This log* factor 
will appear in many other places, such as calculation of mutual information. A simple remedy for this problem is 
to define prefix Kolmogorov complexity described in the next section. 

3) Prefix Kolmogorov complexity: 

Definition 4: A partial recursive function (j> : {0, 1}* — > {0, 1}* U {0, 1} 00 is called a partial recursive prefix 
function if and only if 4>(p) < 00 and 4>(q) < 00 imply that neither p nor q is a prefix of the other. 
Let <j> be a partial recursive prefix function. Define the complexity of string s for function <f> as 

K^(s) = min £(p). 

P s=<Kp) 



As before, an invariance theorem holds for K<f,(s) |11| (Section 3.1). See [11] (Chapter 3) for several other 



appealing properties of the prefix Kolmogorov complexity. 



B. Proof of Theorem [7] 

i. The following program prints x : Print the following bit sequence x\, x%, ■ • • , xgM- The first part that explains 
the structure has a constant length, c, and then the bits themselves require ^(x) bits. Therefore, the length of 
the program is less than £(x) + c. 

ii. Let p x and p y denote the shortest programs that print x and y respectively. The following program prints 
(x, y): Print a concatenation of two numbers and the programs for these two numbers are p x and p y . 
Note that since the programs are assumed to be prefix free, after the explanation, "Print a concatenation of 
two numbers", the machines continues until it goes into the halting state. At this point it has already printed 
x . But since it knows that we expect another number, it again starts to read the bits and therefore will print 
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y as well. 

iii. The proof of this part is also straightforward since we can ignore our information on y and code x as if we 
do not know y. 

iv. We use the same program that we used in Part 1. Notice that since the machine does not know £(x) we should 
spend K(£(x)) bits to describe this number as well. Hence, overall we require K (x | ^(x)) + K(£(x)) + c bits. 

v. First note that the length of the binary representation of n which is denoted by £(n) is logn. According to 
Part iv we have 

K(n) < K{n\e(n)) + K(£(n)) +c< £(n) + 2 max(log(log n), 1) + d 

< log n + 2 max(log log n, 1) + d . (59) 

vi. The proof is very similar to the proof of Part ii, and hence we skip it. 
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